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" Abstract 
>. 

In this paper, we concentrate on new methodologies for copulas introduced and developed 

00 ■ 

, by Joe, Cooke, Bedford, Kurowica, Daneshkhah and others on the new class of graphical 

_ models called vines as a way of constructing higher dimensional distributions. We develop 

■ the approximation method presented by Bedford et al (2012) at which they show that any 

■ n-dimensional copula density can be approximated arbitrarily well pointwise using a finite 

■ parameter set of 2-dimensional copulas in a vine or pair-copula construction. Our constructive 



approach involves the use of minimum information copulas that can be specified to any required 
degree of precision based on the available data or experts' judgements. By using this method, we 
are able to use a fixed finite dimensional family of copulas to be employed in a vine construction, 
with the promise of a uniform level of approximation. 

The basic idea behind this method is to use a two-dimensional ordinary polynomial series 
to approximate any log-density of a bivariate copula function by truncating the series at an 
appropriate point. We present an alternative approximation of the multivariate distribution of 
interest by considering orthonormal polynomial and Legendre multiwavelets as the basis func- 
tions. We show the derived approximations are more precise and computationally faster with 
better properties than the one proposed by Bedford et al. (2012). We then apply our method 
to modelling a dataset of Norwegian financial data that was previously analysed in the series 
of papers, and finally compare our results by them. 

Keyword: copula, entropy, expert judgement, information, Legendre multiwavelets, or- 
thonormal polynomial series, pair-copula construction, uncertainty modelling, vine 
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1 Introduction 



Bedford and Cooke (2001, 2002) introduce a probabilistic construction of multivariate distributions 
based on the simple graphical model called vine. This model represents an entirely new approach of 
building complicated multivariate and highly dependent models which can be seen as the classical 
hierarchical modelling. The principle behind the vine construction is to model dependency using 
simple local building blocs based on conditional independence (e.g. cliques in random fields). Aas 
et al (2009) called these building blocs, pair-copulae. They use the pair-copula decomposition of a 
general multivariate distribution and propose a method to perform inference. 

They investigate modelling complicated high-dimensional data by fitting different parametric 
bivariate copulas to construct the corresponding pair-copula model. However, there is a huge 
number of parametric bivariate copulas, but it is well known that building higher-dimensional 
copulae is generally a difficult problem, and choosing a parametric family for the given higher- 
dimensional copula is rather more difficult and limited (see Embrechts et al., 2003). As a result, 
the problem of choosing a parametric copula for a higher-dimensional copula is reduced to fitting a 
parametric bivariate copulas to data. Bedford et al. (2012) stated that the use of a copula to model 
dependency is simply a translation of one difficult problem into another: instead of the difficulty 
of specifying the full joint distribution we have the difficulty of specifying the copula. The main 
advantage is the technical one that copulas are normalized to have support on the unit square and 
uniform marginal distributions. Therefore, the potential flexibility of the copula, by restricting 
them to a particular parametric class (e.g., Gaussian, multivariate f-student, etc) is not realized in 
practice. 

To overcome this difficulty, Bedford et al (2012) proposed an alternative approach at which a 
vine structure can be used to approximate any given multivariate copula to any required degree of 
approximation. This method can be easily implemented in practice. It is only required to assume 
that the multivariate copula density of interest must be continuous and non-zero. 

This method is constructive and involves the use of minimum information copulas that can be 
determined to any required degree of precision based on the available data or expert judgements. 
It can be shown that good approximation 'locally' guarantees good approximation globally. It 
can be shown hat a vine structure imposes no restrictions on the underlying joint probability 
distribution it represents (Bedford et al., 2012). Furthermore, Kurowicka and Joe (2011) reported 
that this is essential to address this question that which vine structure is most appropriate where 
some structures allow the use of less complex conditional copulas than others. Conversely, if we 
only allow certain families of copulas then one vine structiire might fit better than another. This 
question is still open and under study, and is beyond the scope of this paper. 

Thus, it is trivial to show that if there is any difficulty to fit a multivariate distribution by a pair- 
copulae model, then the problem is not related to the vine structure but the copulae/conditional 
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copulae. As a result, the question "does a vine structure fit" only makes sense in the context 
of a given family of copulae. Therefore, we need to have a class of copulae with which we can 
approximate any given copula to an arbitrary degree. 

A natural way to build a minimtim information copula or specifying dependency constraints is 
through the use of moments (Bedford, 2006). These can be specified either on the copula or on the 
underlying bivariato density. We follow Bedford et al. (20f 2) to consider the moment constraints in 
which real- valued functions . . . ^(j)k are required to take expected values ci. . . . , Ck, respectively. 
We then fit a minimum information copula that satisfies a set of constraints as above and which 
has minimum information (with respect to the uniform copula c{u, v) = uv) amongst the class of all 
copulas satisfying those constraints. It is trivial to show that this copula is the "most independent" 
bivariate density that satisfies these constraints. In addition, a specification of minimum information 
bivariate copulas naturally leads us to the minimum information vine distributions. Particularly, 
it can be shown that if a minimal information copula satisfied each of the (local) constraints (on 
moments, rank correlation, etc.), then the resulting joint distribution would also be minimally 
informative given those constraints (see Kurowicka and Cooke, 2006). 

In order to calculate the minimum information copula associated with the constraints mentioned 
above, an iterative numerical method called D1AD2 algorithm is used by Bedford and Meeuwissen 
(1997). The number and type of the real- valued functions . . . , (j)^ can control the accuracy of 
the approximation approach and the cost of computation. Bedford et al (2012) develop this method 
by using the ordinary polynomial bases to approximate a multivariate distribution of interest. 

The main objective of this paper is to improve the density approximation proposed by Bedford 
et al (2012) by considering several other bases including orthonormal polynomial series and Legen- 
dre multiwavelets, and examine their properties and possible applications. By using orthonormal 
polynomial basis functions the accuracy of approximation will be increased and the computation 
cost will be considerably decreased. We will show that orthonormal polynomial bases are more 
convenient than the other natural bases (e.g. polynomial series) for the purpose of calculation. 

In addition to the orthonormal polynomial bases which exhibits very nice properties and effi- 
cient to implement in practice, we can improve the approximation of a multivariate density even 
further using the wavelets which have been recently used for density estimation. The wavelets have 
become popular due to their ability to approximate a large class of functions, including those with 
localized, abrupt variations. However, a well-known attribute of wavelet bases is that they can 
not be simultaneously symmetric, orthogonal, and compactly supported. Multiwavelets-a more 
general, vector-valued, construction of wavelets-overcome this disadvantage, making them natural 
choices for estimating density functions, many of which exhibit local symmetries around features 
such as a mode. In particular, using Legender miiltiwavelets as basis functions will improve accu- 
racy of approximation incredibly and the computation cost will be considerably decreased even in 
comparison of the orthonormal polynomial bases. We show the efficiency of our method using the 
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mentioned bases as above by comparing them with the model developed by Bedford et al. (2012) 
and the one proposed by Aas et al (2009) for modeling the Norwegian financial data which has been 
also studied by these authors. 

The paper is organised as follows. In Section 2, we introduce the pair-copula decomposition 
associated with a multivariate distribution of interest. As an example for better understanding, we 
also present a vine structure regarding the Norwegian financial data in this section. We briefly study 
the minimum information copiila and the approximation approach presented by Bedford et al (2012) 
in Section 3. In section 4, we develop the minimum information copula based approximation method 
to estimate corresponding multivariate distribution. We develop this method using orthonormal 
polynomial series (obtained based on Graham-Schmidt method) and Legender multiwavelets as the 
basis functions in Section 5. In section 5, we also illustrate how to construct Legender multiwavelets 
basis. In Section 6, we apply our method based on these new bases to modelling Norwegian Financial 
returns data. We also exhibit the potential flexibility of our approach by comparing it with the 
other methods. The future directions of this work and some other conclusions will be given in 
Section 7. 

2 Vine Constructions of multiple dependence 

Kurowicka and Cooke (2006) highlighted the point that however, the copula families, such as the 
exchangeable multivariate Archimedean copula or the nested Archimedean constructions, constitute 
a huge improvement, but they are still not rich enough to model all possible mutual dependencies 
amongst the n variables. This is also illustrated by Aas et al (2009) and Bedford et al (2012). 
Therefore, a more flexible structure called pair-copula construction or vine proposed by them which 
allows for the free specification of n(n — l)/2 copulae and is hierarchical in nature. This modelling 
structure is based on a decomposition of a multivariate density into a cascade of bivariate copulae. 

In other words, a vine associated with n variables is a nested set of trees, where the edges of the 
tree j are the nodes of the tree j + 1; j = 1, . . . ,n — 2, and each tree has the maximum number of 
edges. A regular vine on n variables is a vine in which two edges in tree j arc joined by an edge in 
tree j + 1 only if these edges share a common node, j = 1, . . . , n — 2. There arc n{n — l)/2 edges 
in a regular vine on n variables. The formal definition of vine and regular vine can be found in 
Kurowicka and Cooke (2006). The following theorem expresses a regular vine distribution in terms 
of its density. 

Theorem 1 Let V — (Ti, . . . , T„_i) be a regular vine on n elements, where Ti is a connected 
tree with nodes Ni = {1,. . . ,n} and edges Ei; for i =■ 2, ... ,n — 1, Tj is a connected tree with 
nodes Ni — Ei-i. For each edge e{j,k) € Ti,i = l,...,n — 1 with conditioned set {j,k} and 
conditioning set Dg, let the conditional copula and copula density he Cjk\De o-nd Cjk\De respectively. 
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Figure 1: A regular vine with 4 elements 

Let the marginal distributions Fi with densities fi,i — 1, . . . ,n be given. Then, the vine- dependent 
distribution is uniquely determined and has a density given by 

n n — 1 

/(a^i, . . . ,a;„) = Y[ CjkiDA^jlD^, F^dJ (1) 

i=l j = l e{j,k)eEi 

Proof. See Bedford and Cooke (2001). 

The density decomposition associated with 4 random variables X — (Xi, . . . ^X^) with a joint 
density function /(xi, . . . , X4) satisfying a copula-vine structure (this structure is called D-vine, see 
Kurowicka and Cooke, 2006, pp. 93) shown in Figure [T] with the marginal densities fi, ■ ■ ■ , fi is 
illustrated as follows 

4 

/(Xi, . . . , X4) = n f{x,) X C12{F{XI),F{X2)}C23{F{X2), F{X3)}C34{F{X3), F{X4)}X 

1=1 

Cl3|2{^(a;i I X2),F{X3 I X2)}C24\3{F{3:2 \ X3),F{x4 \ X3)} X Ci4l23{F{xi \ X2,X3),F{x4 \ 2:2, X3)} 

(2) 

It is trivial to show that if f{xi, . . . , Xn) is absolutely continuous to product /i, . . . , /„, it then 
can be represented by any vine-dependent distribution. The existence of regular vine distributions 
in details is discussed in Bedford and Cooke (2002). We illustrate briefly how such a distribution 
is determined using the regular vine in Figure [1] as an example. We make use of the expression 



f{xi,. . . ,X4) = f{xi)f{x2, I Xi)f{x3 I Xi,X2)f{x4 \xi,.. . , ^3) 



The marginal distribution of Xi is known, so we have /i. The marginals of Xi and X2 are known, 
and the copula of Xi, X2 is also known, so we can get f{xi,X2), and hence f{x2 \ xi). In order to 
get f{x3 I xi,X2) we can determine f{x3 \ X2) in the similar way as f{x2 \ xi). Next we calculate 
f{xi I X2) from f{xi,X2)- With f{xi \ X2), f{x^ \ X2), and the conditional copula oi Xi^X^ given 
X2 we can determine the conditional joint distribution f{xi,xz \ X2), and hence the conditional 
marginal /(a^s | xi, 2:2). Progressing in this way we obtain /{x^ \ xi, . . . , 2:3). As a result, we can 
state the following theorem. 

Theorem 2 Given a distribution with density function f{xi, . . . ,Xn) and a vine V on n elements, 
there are copulae Cj^^,^ such that (QP is satisfied, that means 

n n~ 1 

/(xi, . . . ,a;„) = ]^/(a;,) J]^ J| CjkloA^jlD,, Fk\Dj 

i=l J = l e{j,k)eEi 

Proof: It is trivial, one should follow the explanation given above to build a 4-dimensional multi- 
variate distribution to prove this theorem. See also Bedford et al. (2012) and references therein. 

The above theorem gives us a constructive approach to build a multivariate distribution given 
a vine structure: If we make choices of marginal densities and copulae then the above formula will 
give us a multivariate density. Hence vines can be used to model general multivariate densities. 
However, in practice we have to use copulae from a convenient class, and this class should ideally be 
one that allows us to approximate any given copula to an arbitrary degree. In the following sections, 
we address this issue in more detail. By having this class of copulae, we then can approximate any 
multivariate distribution using any vine structure. 

Unlike the situation with Bayesian networks, where not all structures can be used to model a 
given distribution, the theorem shows that - in principle - any vine structure may be used to model 
a given distribution. However, in practice it seems that some vine structures do work better than 
others, and so this must be a result of restricting to a particular family of copulas. That is, given a 
family of copulae, some vine structures may give a better degree of approximation than others. In 
fact, we could say that the question "does a vine structure fit" only makes sense in the context of 
a given family of copulae. 

3 Building bivariate minimum information copulae 

This section sets out to show that we can use the minimum information techniques originated 
from Bedford and Meeuwissen (1997) in conjunction with the observed data or expert elicitation 
of observables, to define a copula that can be used to build the joint distribution of two random 
variables. The method that will be described below is based on using the D1AD2 algorithm to 
determine the copula in terms of potentially asymmetric information about two variables of interests. 
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3.1 The D1AD2 algorithm and minimum information copula 

Bedford and Mceuwisscn (1997) applied a so-called DAD algorithm to produce discretizcd mini- 
mally informative copula between two variables with given rank correlation. This approach relies 
on the fact that the correlation is determined by the mean of the symmetric function UV. The 
same approach can be used whenever we wish to specify the expectation of any symmetric function 
of U and V (see Bedford, 2006; Lewandowski, 2008). 

This method can be developed further using the idea stated in Borwein et al. (1994) which 
enables us to have asymmetric specifications. In the revised method, we first determine a positive 
square matrix A, also called a kernel, and two diagonal matrices Di and D2 should be then found 
in such a way that the following product, D1AD2 is doubly stochastic. The theory can be easily 
generalised for continuous functions (see Bedford et al, 2012). 

Now, suppose there are two random variables X and Y, with cumulative distribution functions 
Fx and Fy, respectively. These are the variables of interest that we would like to correlate by 
introducing constraints based on some knowledge about functions of these variables. Suppose 
there are k of these functions, namely h'i{X, Y), h2{X, Y), . . . , h'^,{X, F), and that we wish either 
to calculate their mean values in terms of the observed data, or the expert wishes to specify mean 
values ai, . . . , ttfc for all these functions, respectively. We can simply specify corresponding functions 
of the copula variables U and V, defined by hi{U, V) = /i-(i^j"^(C/); F2^{V)), i = 1,2, . . . ,k, where 
hi : [0, 1]^ — )• R, at which we can specify the mean values ai, . ■ .,ak that these functions should 
simultaneously take. Further suppose that hi, hj are linearly independent for i ^ j. We seek a 
copula that has these mean values, a problem which is usually either infeasible or under determined. 
Hence, assuming feasibility for the moment, we also ask that the copula be minimally informative 
(with respect to the uniform distribution), which guarantees a unique and reasonable solution. We 
form the kernel 

A{u,v) ^ cxp{\ihi{u,v) + . . . + \khk{u,v)) (3) 

where u denote the realization of U and v the realization of V . 

For practical implementations, we use the same method as proposed by Bedford et al (2012) to 
discretize the set of {u, v) values such that the whole domain of the copula is covered. Thus, the 
aforementioned kernel A becomes a 2-dimensional matrix, and two matrices Di and D2 should be 
then determined. As a result, the following product denoted by P over [0, 1]^ becomes a doubly 
stochastic matrix which represents a discretized copula density. 

P = D1AD2 (4) 

The D1AD2 algorithm can be used to generate a unique joint density with uniform marginals 
for each vector (Ai,...,Afc). The set of all possible expectation vectors (ai,...,^^) that could 
be taken by {hi,h2, ■ ■ ■ ,hk) under some probability distribution is convex, and that for every 
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(ai, . . . , ak) in the interior of that convex set there is a density with parameters (Ai, . . . , Afc) for 
which {hi, h2, ■ ■ ■ , hk) take these values (see Borwein et ah, 1994; and Bedford et ah, 2012). 

We now explain the iterative algorithm required to approximate the mentioned copula density 
by this algorithm. Suppose that both {u,v) are discretized into n points, respectively as Ui, and 
Wj, i, J = 1, . . . , n. Then, we write A = {aij), Di = diag(cf^\ . . . , dn'^), D2 = diag(d^^\ . . . , dn'^), 
where aij ~ A{ui,Vj), d.^^* = Di{ui), dj^"* — D2{vj). We define the doubly stochastic matrix, 
D1AD2 with the uniform marginals as follows 

Vi = 1, . . . n df'^ d^paij — 1/n, and 
3 

Vj = l,...n a,, =l/n, 

1 

The idea behind of D1AD2 algorithm is very simple which starts with arbitrary positive initial 
matrices for Di and D2, and the new vectors will then be successively defined by iterating the 
following maps 

4^^ ^ „ \2) = df ^ , (j = l,...,n) 



''T,jd) 'aij nY,^d\ 'a^j 

It can be shown that this iteration scheme converges geometrically to the requested vectors (see 
Borwein et al., 1994). 

Note that to compare different discretizations (for different n) we should multiply each cell weight 
di{l)dj{2)aij by ri^ as this quantity approximates the continuous copula density with respect to the 
uniform distributions. 

The mapping from the set of vectors of A's onto the set of vectors of resulting expectations of 
functions {hi, . . . , hk) has to be found numerically. Bedford and Daneshkhah (2010) and Bedford 
et al (2012) proposed the optimization techniques to determine the A^'s and corresponding copula. 
The expectations ai of k functions of variables X and Y are given by 

E[h'^{X, Y)] = EMU, V)]^a^, i^l,..., k. 

We now wish to determine the appropriate set of A's for given expectations ai, where the expec- 
tations have been calculated using the discrete copula density D1AD2 given in Q. Hence, to 
determine A^'s satisfying the constraints, the following set of equations has to be solved 

n n 

Li{Xi,...,Xk) ^ — ^^^^P{ui,Vj)hi{ui,Vj) - ai, 1^1,2, ...,k. (5) 
" i=i j=i 

The left hand sides of the above equations are just functions of A's and with optimization algorithms 
their roots can be found. One of the possible solvers for this task would be FSOLVE - MATLAB's 
optimization routine. An alternative method is to use another MATLAB's optimization procedure 
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called FMINSEARCH, which implements the Nelder-Mead simplex method (see Lagarias et al., 
1998). The minimized function is then 



1=1 

We refer the interested reader to Lewandowski (2008) and Bedford et al (2012) to show how an 
expert could specify a copula though defining expected values. 

4 Approximating Multivariate Density by Vine 

In this section, we use techniques from approximation theory to show that any n-dimensional 
multivariate density which is (that is, twice differentiable, with continuous second derivatives) 
can be approximated arbitrarily well pointwise using a finite parameter set of 2-dimensional copulas 
in a vine construction. The basic idea is that we can use a series expansion, like a two-dimensional 
Polynomial scries, orthonormal Polynomial scries or Legcnder multiwavelcts, to approximate any 
log-density function by truncating the series at an appropriate point. What is non-trivial, however, 
about this method, is that the same truncation can be used everywhere in a vine construction and 
gives overall uniform pointwise approximation. Hence our method allows the use of a fixed finite 
dimensional family of copulas to be used in a vine construction, with the promise of a uniform level 
of approximation. Since the approximations we make of copula densities might not be quite copula 
densities themselves, wc need to transform them to make them copulas. 

To demonstrate this, we first should show that the family of bivariatc (conditional) copula den- 
sities contained in a given multivariate distribution forms a compact set in the space of continuous 
functions on [0, 1]^. Then, it can be shown that the same finite parameter family of copulae can be 
used to derive a given level of approximation to all conditional copulae simultaneously. 

Here, we develop the approximation method used by Bedford et al. (2012) to approximate 
any log-density function at the desired level of approximation which is more accurate and exhibits 
better properties. We first introduce some notations. The basic assumption is that all densities are 
continuous. Wc denote C{Z) as the space of continuous real valued functions on a space Z, where 
Z = [0, 1]'' for some r, and the corresponding norm on C{Z) is given by 



k 




\\fl...r\\ = sup |/l...r(a;i, . . .,Xr)\. 



The set of all possible 2-dimensional (conditional) copulae is denoted by 



<^{f) = {Cij\ii...ir ■ 1 < i,j,ii,---,ir < n, 



where Cij\i. 



is the copula of the conditional density of Xi, Xj given 



ii ) • * • ? 
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The famous Arzela-Ascoli theorem can be used to check the compactness of the following function 
space, K c C([0, 1]^). This space is relatively compact if the functions in K are equicontinuous 
and pointwise bounded. 

It can be shown that the following two spaces are relatively compact (Bedford et al. (2012), 
Theorem 3). 

M{f) = : 1 < < 7^ ii,...,ir}, 

and 

^{f) = {fij\h...ir ■ 1 < i,j,ii,---,ir < n,i,j 

where fi\i^...i^ is the conditional density of Xi given and fij\i^...i^ is the conditional 

density of Xj, Xj given Xi^ , . . . , Xi^ . 

It is then straightforward to show that the set C(/) C C([0,1]^) is relatively compact. In 
addition, since all the functions in C(/) are positive and uniformly bounded away from 0, the set 
CMC{f) = {\n{g) : g G C(/)} C C([0, 1]^) is also relatively compact (see Bedford et al. (2012) for 
details and proofs). 

As a result, the set C([0, 1]^) can be considered as a vector space, and in this context a base is 
simply a sequence of functions hi, /i2, • • • € C([0, 1]^) such that any function g G C([0, 1]^) can be 
written as g = X^i^i K^i- In other words, it can be shown that given e > 0, there is a fc such that 
any member oi CNC{f ) (or C(/)) can be approximated to within error e > by a linear combination 
of h\,h2, ■ ■ ■ ,hk- There are lots of possible bases, for example, the following polynomial series 

u, V, uv, u^, ^u^vuv^ , 

which was mainly used in Bedford et al. (2012). 

In the next section, we will improve this density approximation based on the minimum informa- 
tion techniques considerably using the orthonormal polynomial series and Legender multiwavelets 
instead the ordinary polynomial series as the basis functions. We also exhibit other nice properties 
of these approximations. 

It should be noticed that the approximated copula density by the method described above might 
not be a copula density itself. Therefore, the resulting approximation needs to be transformed in 
such a way to obtain a copula. This can be done by weighting the approximated density. One of the 
most effective weighting schemes is the D1AD2 algorithm mentioned in the previous section. If we 
have a continuous positive real valued function A{u,v) on [0, 1]^ then there are continuous positive 
functions di{u) and ^2(^)1 such that di.d2.A is a copula density, that is, it has uniform marginal 
distributions. This density is called C- Projection of A and denoted by C{A). Bedford et al (2012) 
present the following lemma at which it allows us to control the error made when approximating a 
copula by another function. 
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Lemma 1 Let g be a non-negative continuous copula density. Given e > there is a 6 such that if 
\\g-f\\<Sthen\\g-Cif)\\<e. 



Note that these rewcighting functions have the same difFerentiabihty properties as the function 
/ being reweighted. This can be seen from the integral equation that they satisfy: 

d'-^Ku) = — - and c^^^H^) = r .riu \ft 

J dy^>{v)j{u,vjdv J d^^>(u)j(u,v)du 

EventuaUy, the term given in (1) can be used to see that good approximation of each conditional 
copula would result in a good approximation of the multivariate density of interest. 



5 Building approximations using minimally informative dis- 
tributions 

In this section, we give practical guide to build a minimally - informative vine structure to ap- 
proximate any multivariate distribution. In the previous section, we present a method proposed 
by Bedford et al. (2012) that all conditional copulae can be approximated using linear combina- 
tions of basis functions. In this section, wc arc going to address the issue of how the appropriate 
parameter values can be chosen. We also introduce a practical and efficient alternative based on 
using the minimum information criterion that lies very close to the approach described above. In 
other words, given the basis functions {l,hi, . . . ,hk} : [0, 1]^ —J- M, we seek values Ai, . . . , Afc so 
that exp(^J^ Aj/ij) is close to the approximated copula density. This can be done by fitting the 
moments of hi in the minimum information framework. Therefore, if Eg[hi{u,v)] = ai, we seek 
for the minimum information copula density that also has these moments. This copula density can 
uniquely be determined, using the D1AD2 algorithm, as follows 

k 

d^{u)d^{v) exp(y~^ Xihi{u, v)). 
1 

As mentioned above, a multivariate distribution can be modelled by a vine structure where it 
can be defined as a decomposition of the given multivariate distribution into certain conditional 
copulae, associated with the conditioned and conditioning sets of the vine. The following algorithm 
is summarised the steps to approximate the given multivariate distribution associated with a vine 
structure: 

1. Specify a basis family, denoted by S{k) = {/ii, /12, • . •} 

2. Specify a vine structure 

3. For each part of vine, the bivariate copulae, specify either 
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• mean ai,. . . ,ak for hi,. . . ,hk on each pairwise copula; 

• functions am{ji \ Dg) for the mean values as functions of the conditioning variables, for 
m = 1, . . . , fc. 

One of the main aspect that would effect the aforementioned approximation is the basis family. 
Here, we examine the impact of two basis families, the orthonormal polynomial scries and Legender 
multiwavelets on approximating the minimum information copulae and the multivariate distribution 
associated with the chosen vine structure. We first briefly introduce these two basis functions. 

5.1 Constructing Orthonormal Polynomial base 

In mathematics, particularly numerical analysis, a basis function is an element of the basis for a 
function space. The term is a degeneration of the term basis vector for a more general vector space; 
that is, each function in the function space can be represented as a linear combination of the basis 
functions. We say two polynomial functions gi and g2 are orthonormal polynomial in the interval 



Orthonormal polynomial base can be more convenient than some natural basis for the purpose of 
calculation. In fact, if the basis is an orthonormal polynomial basis, adding a new item to the 
expansion does not change coefficient of the already found shorter expansion (Gui, 2009). But if 
the basis is not orthonormal, any new item has in general nonzero projection on previous items. It 
means that the already found coefficients of the expansion would have to be changed. That is one of 
the reason we use orthonormal polynomial basis functions as the basis family, S{k). It is reasonable 
to consider Gram-Schmidt orthonormal polynomial basis which is one of the famous orthonormal 
polynomial basis functions on [0, 1]. 

To construct this orthonormal polynomial basis over the interval [0,1], we use the Gram-Schmidt 
process as follows. 



[0, l],if 




for gi{u) = g2{u); 
for gi{u) ^ 52 (w). 



(6) 



ipo{u) = 1, 



Vn{u) = 



,„-l /o^ u"ifij(u)du 



(fij (u) 



n > 1 




The first few functions are 



(poiu)^l, ipi{u) — V^{—1 + u), 932(u) = \/5(l — 6u + 6u^), 
ipsiu) = \/7(-l + 12u - 30^2 + 201*3), ipi{u) = V9(l - 20m + 90m^ - UOu^ + TOw"*) 
ip5iu) = Vn{-1 + 30w - 210^2 + 560^3 - 630u^ + 252u^), . . . 
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5.2 Constructing Legender Mult i wavelets base 

The use of wavelets for density estimation has recently gained in popularity due to their ability to 
approximate a large class of functions, including those with localized, abrupt variations. However, a 
well-known attribute of wavelet bases is that they can not be simultaneously symmetric, orthogonal, 
and compactly supported. Therefore, a more general, vector-valued, construction of wavelets is 
proposed by Locke and Peter (2012) to overcome this disadvantage, and making them natural choices 
for estimating density functions, many of which exhibit local symmetries around features such as 
a mode. Locke and Peter (2012) introduce the methodology of wavelet density estimation using 
multiwavelet bases and illustrate several empirical results where multiwavelet estimators outperform 
their wavelet counterparts at coarser resolution levels. 

In this section, we use the multiwavelet bases to approximate the minimum information copula. 
The main advantage of using these bases over the polynomial bases introduced in the previous 
subsection is that the wavelets (and in particular, multiwavelets) are are better choices where the 
functions of interest contain discontinuities and sharp spikes. In addition, in order to preserve the 
orthonormality property among the multiwavelet bases, we use Legender multiwavelet bases. 

In order to construct these bases, we need to introduce some notions and definitions which are 
briefly described in the following subsections. 

5.2.1 Multiresolution einalysis 

Wavelet theory is based on the idea of multiresolution analysis (MRA). Usually it is assumed that an 
MRA is generated by one scaling function, and dilates and translates of only one wavelet </> e i^(M) 
form a stable basis of i^(M). 

We can generate a reference subspace or sample space Vq as L^-closure of the linear span of the 
integer translation of the following functions ^"^ G L'^{R), m = 0, . . . , r, namely 

Vb = cIosl2 -< 4>^{. -k):k€Zy, m = 0,...,r, 

and consider subspace 

Vj = cIosl^ -< (j/^f, :k€Zy, j G Z and m = 0, . . . , r, 

where t/)™^ = (/>'"(2Jx -k) ■.j,ke Z, m = 0,...,r. 

Now, we are able to present a proper definition of multiresolution analysis as follows. 
Definition 1: Functions 0™ e L^{R), are said to generate a multiresolution analysis (MRA) 
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if they generate a nested sequence of closed subspaces Vj that satisfies 

i) ... C y_i cVoCViC ... 

ii) closM^^^V,) = L^R) 

in) r\j^zyj = o ■ (7) 

iv) (l)"'{x) e Vj <^ (l)"'{x + 2-i) e Vj <^ (i)"'{2x) e Vj+i 

v) — k)}kez', form a Riesz basis of Vb 

If 0™ generates an MRA, then 0™ are called scaling functions. In case that the different in- 
teger translate of (jf^ are orthogonal (with respect to the standard linear product -< f,g >~ 
I-tx> /(•'') 5 (•'')'^-^) f'^^ two functions in L'^(R), denoted by — k)-L<j/^{. — k) for m j^rn, k ^ k, 
the scaling functions are called an orthogonal sealing functions. 

As the subspaces Vj are nested, there exist complementary orthogonal subspaces Wj such that 

Vj+i = Vj^Wj, jez 

here and in the following denotes orthogonal sums. 

This yields an orthogonal decomposition of L'^{R), namely; 

L\R)^@Wj, 

Definition 2: Functions Vj™ G L^{R) are called wavelets, if they generate the complementary 
orthogonal subspaces Wj of a MRA, i.e., 

Wj = cIosl2 -< tpjl : k e Z y, j & Z, and m = 0, . . . , r, 

where = ^p"'{2^x -k),j,ke Z. 

Obviously, ^PJ^^ ± xpf-^ for j m ^ m B.nd k ^k, ii < 2^/^,^^ ^= ^j,j^k,k^rn,m, 

then tjj'^ are called orthogonal wavelets, where 



1 for i = k; 
for i ^ k. 

Now, we are able to define Legender scaling functions and its corresponding multiwavelets according 
to MRA definition give above. 

5.2.2 Construction of Scaling Functions 

Legendre multiwavelets system with multiplicity r consists of r scaling functions and r wavelets. 
The r-th order Legendre scaling functions are the set of r + 1 functions (j)°{x),. . . , (if{x) where (/>*(x) 
is a polynomial of i-th order and all (^'s form orthogonal basis (Shamsi and Razzaghi, 2005), that 
is, for i = 0, 1, . . . ,r, 

i 

(j)\x) = ^a,kx'', for z = 0,l,...,r (8) 
fe=o 
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The coefE-cient aik are chosen so that a^fe > 0, and 



(t)'{x)(t>''{x)dx = 6,,k, for i,fc = 0, l,...,r (9) 







The scaHng functions 4>^{x) have symmetry, anti-symmetry properties for odd or even i, respectively. 
The two-scale relations for Legendre scaling functions of order r, are in the form (Albert et al., 2002); 

r r 

0^(a;)=^p,,j0^ (2x)-H^K,.+,+i<^-'(2x-l), for i = 0, 1, . . . , r (10) 

The coefficients pi^j determined uniquely by substituting equation (8) to (10). Now we would like 
to mention two remarks on the two scale relations. 

1. Since 0*(x) is a «-th order polynomial, the right hand side of (10) has at most i-ih. order 
scaling functions. Therefore, pij — pi^r+j+i = for i < j. 

2. The two scale relations for the Legendre scaling function of order n which is lower than r is a 
subset of first n two-scale relations for for i = 0, 1, . . . , n form r-th order two scale relations. 

5.2.3 Construction of Wavelets 

The two-scale relation for the r-th order Legendre multiwavelets is given in the following form 
(Albert et al, 2002): 

r r 

V'*(x) =^g,,,(^-'(2a:)+^g,,,+,+i(/)-'(2x-l), for i = 0,l,...,r. (11) 

j=o J=0 

The 2(r-|-l)^ unknown coefficients {qij} in (jlip can be determined in terms of the following 2r(r+l) 
vanishing moment conditions and 2(r + 1) orthongonal conditions ([T5|) . 

Vanishing moments 

'il;'{x)x^dx = 0, for i = 0,l,...,r; j = 0,l,...,i-hr. (12) 



Orthogonality 

'il;'{x)'ilj^{x)dx = S,j, for i,j = 0,l,...,r. (13) 

For example, the Legendre scaling functions of order 5 consist of 6 functions as follows: 
(t)°{x) ==1 for < a; < 1 



(j)\x) ^ VSi-l + 2x) forO<x<l 

(j)^{x) = V5{l-6x + 6x^); for < a; < 1 

03 (x) = V7{-1 + 12x - 30a;2 + 2Qx^) for < a; < 1 

(j)^{x) = V9{1 - 20a; -I- 90x'^ - UOx^ + 70x'^) for < a; < 1 

(j>^{x) = \/TT(-l + 30x - 210x2 + 560x3 _ q^q^4 ^ 252x^) for < a; < 1 



(14) 
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The closed form solutfon to the Legendre muhiwavelets of order 5, ^'"(a^): {x)ip'^ (x) , (x) , (x) 
and ^p^{x) are given befow which are determined using the conditions (jl2p and (|13p. 

3.55 - 146.72a; + 1419.86a;2 - 5300.81x3 + 8519.15a;'* - 4997.9x5 for < x < i; 

-502.87 + 4122.32X - 13346.68x2 + 21203.23x3 - 16470.37x'* + 4997.907x5 for i < x < 1 

-3.47 + 181. 55x - 2188.78x2 + 10023.38x3 - 19433.09x^ + 13500.89x5 for < x < i 

-2080.47 + 15646. 19x - 46291.67x2 + 67299.87x3 - 48071.33x4 + 13500.89x5 for i < x < 1 

2.81 - 174.03X + 2438.52x2 - 12760.78x3 + 27823.96x4 - 21415.36x5 for < x < i 

-4084.87 + 29360.26X - 83053.61x2 + 1.16 x 105x3 - 79252.82x4 + 21415.36x5 for i < x < 1 

1.71 - 121. 14x + 1911.69x2 - 11113.58x3 + 26588.59x4 - 22203.27x5 for < x < i 

4935.99 - 34300.49X + 93930.24x2 _ 27 x 105x3 + 84427.78x4 - 22203.27x5 for i < x < 1 

-0.71 + 56.63X - 998.10x2 + 6413.33x3 - 16797.83x4 + 15222.11x5 for < x < i 

3895.43 - 26219. 63x + 69675.97x2 „ 91443.07x3 + 59312.70x4 - 15222.11x5 for i < x < 1 

0.17 - 15.67X + 308.12x2 „ 2193.38x3 + 6324.24x4 - 6273.06x5 for < x < i 

1849.58 - 12047.91X + 31057.19x2 - 39627.04x3 + 25041.07x4 - 6273.06x5 for i < x < 1 

6 Application: Norwegian Financial returns 

In this section, we apply the approximation method presented in this paper using the basis func- 
tions introduced in the previous section as the basis families, S{k) (as mentioned in the first step 
in the algorithm above) to approximate the multivariate distribution associated with the selected 
vine structure corresponding to the Norwegian financial returns. We then exhibit the potential 
flexibility of our approach by comparing it with the other methods cited in Bedford et al. (2012) 
and Aas et al. (2009). 

Example: In this example we use the same data set as considered by Aas et al. (2009) and 
Bedford et al. (2012) to illustrate the approximation method introduced in this paper. The data 
consists of four time series of daily data: the Norwegian stock index (TOTX), the MSCI world stock 
index, the Norwegian bond index (BRIX) and the SSBWG hedged bond index. They are recorded 
over the period 04.01.1999 to 08.07.2003 at which 1094 data are collected. We denote these four 
variables T,B,M and S, respectively. 

We first shall remove serial correlation in these four time series, that is, the observation of each 
variable must be independent over time. Hence, the serial correlation in the conditional mean and 
the conditional variance are modeled by an AR(1) and a GARCH(1,1) model (BoUerslev, 1986), 
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Figure 2: Selected vine structure for the Norwegian stock data set with 4 variables: Norwegian 
stock index (T), MSCI world stock index (M), Norwegian bond index (B) and SSBWG hedged 
bond index (S) 

respectively. Thus, the following model for log-return Xi is considered for the i*'* time series 

Xi,t = Ci + aiXi^t-l + (^i,tZi,t 

E[zi,t] = and Var[zi^t] = 1 
(^i,t = "i,o + aie?,t-i + ^iO-^lt-i 

where ei,t_i = (Ji^t + -^i.t (see Aas et al., 2009). 

The further analysis is performed on the standardized residuals Zi . If the AR(1)-GARCH(1,1) 
models are successful at modeling the serial correlation in the conditional mean and the conditional 
variance, there should be no autocorrelation left in the standardized residuals and squared standard- 
ized residuals. We can use the modified Q-statistic and the Lagrange multiplier test, respectively, 
to check this (Aas et al, 2009). For all series, the null hypothesis that there is no autocorrelation left 
for the both tests cannot be rejected at the %5 level. Since, we are mainly interested in estimating 
the dependence structure of the risk factor, the standardized residual vectors are converted to the 
uniform variables using the kernel method before further modeling. We denote the converted time 
series of T, M, B and S by X, Y, Z and U, respectively. 

Here, we are going to derive the vine approximation fitted to this data set to any given mul- 
tivariate density using minimum information distribution. We adopt a vine structure to these 
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data, as presented in Figure [5] Note that, the corresponding functions of the copula variables 
X, Y, Z and U associated with T,M,B,S can be derived. For instance, these are defined by 
h^{X,Y) = h'^{Fj;^{X),F2^{Yj) and should also have the same specified expectation, that is, 
E{h^{T, M)) — E{hi{X, Y)). We derive the minimum information copulae calculated in this exam- 
ple based on them the copula variables, X, Y, X, W. We initially construct minimally informative 
copulas between each set of two adjacent variables in the first tree, Ti. Therefore, it is essential 
to decide which bases should be taken and how many discretization points should be used in each 
case. We start illustrate our procedure for the first copula in the first tree between T, M. 
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Figure 3: The log-likelihood of the minimally informative copula between T and M, calculated 
based on orthonormal basis (*) and Legendre multiwavelets (A). 

We could simply choose basis functions, starting with simple orthonormal polynomials or Legen- 
dre multiwavelets basis, and moving to more complex ones, and include them until we are satisfied 
with our approximation. We included the following orthonormal polynomial basis functions, con- 
structed using Gram-Schmidt process, in order 

Lpi{x)(pi{y), (pi{x)(p2{y), ip2{x)(pi{y), (pi{x)(p3{y), ip3{x)(pi{y), 

(P2{x)(p2{y), ip2{x)(p3{y), (P3{x)(p2{y),ipi{x)(p4{y),(p4{x)ipi{y), 

fi{x)f5{y),'P5ix)tpi{y),ip2{x)f4{y),'f4{x)(p2{y),'f3{x)ip3{y), ■ . ■ 

and the following Legendre multiwavelets basis functions which is constructed based on the method 
presented in subsection 5.2 

0i(x)V°(2/), 0'(x)^°(y), 0i(a;)V/(y), <^2(a;)V^(y), 

cP^x)i:°{y),<l>\x)cl,\y),i,\x)i:\y),ij°{x)^/j^{y),^P^{x)i:°{y), 
4>\x)4>\y), 0'(x)0i(2/), ^''{x)ip^{y),^\x)ip\y)^\x)ip\y), . . . 



18 



Bedford et al. (2012) show that adding the basis functions in this way is not optimal, and propose 
a method which is similar to a stepwise regression. In this method, at each stage, we propose to 
assess the log-likelihood of adding each additional basis function. We then include the function 
which produces the largest increase in the log-likelihood. At moment, we are investigating some 
other methods, such as. Genetic, PSO algorithm. Lasso and ant-colony algorithms, to find the most 
optimal basis functions in a sense that with smaller number of these bases, we would get the largest 
log-likelihood. 

Figure [3] shows the changes of log-likelihood in terms of adding basis functions for orthonor- 
mal polynomial (*) and Legendre multiwavelets (A). In order to compare our results with the 
approximations made in Bedford et al. (2012) using the ordinary polynomial series, we choose six 
orthonormal basis functions using the stepwise method as follows 

(pi(T)(^i(M),^2(r)<^2(M),(^i(r)(p2(M),^i(T)(^3(M),^3(r)<^3(M),<P4(T)(^i(M) 

and also we choose six Legendre multiwavelet basis functions as follows 

01 {T)<t>^ {M),cb' (T)<l>' (M) , 04 (r)05 (Af ) , 01 (T)02 (M) , cf,^ {T)<j>^ (M) , ( j.)^4 

The corresponding log-likelihood based on orthonormal plynomial functions reaches to 60.66 and 
based on Legendre multiwaveletswhich reaches to 63.36 which both are more than the log- likelihood, 
58.1256, based on six basis functions calculated in Bedford et al. (2012). The corresponding 
expectations of the selected orthonormal plynomial basis functions using the Norwegian financial 
returns data are calculated as 

ai = -0.2292, a2 = 0.2104, as = 0.0808, ua = -0.1025, = -0.1120, ae = 0.0463 

and also for the selected Legendre multiwavelet bases are given by 

ai= 0.4803, a2 = 0.2298, aa = -0.0021, 04 = 0.0194, = 0.0866, ag = 0.0191, 

We now able to construct the minimum information copula Ctm with respect to the uniform 
distributions given the constraints as the selected basis functions reported above by the method 
described in this paper. We first need to determine the number of discretization points (grid 
size). It is trivial to conclude that a larger grid size will provide a better approximation to the 
continuous copula but at the cost of more computation time. Similarly, the approximation will 
become more precise if we run the D1AD2 algorithm in more iterations. Indeed, this would cost 
us more computation time. Bedford et al. (2012) show that the number of iterations needed will 
also depend on the grid size. The considered errors are reported to be in the range 1 x 10^^ to 
1 X 10^^^. Thus, the larger the number of grid points used, the larger the number of iterations that 
are needed for convergence which is true over all error levels. The grid sizes all follow the same 
pattern with large increases in the number of iterations needed for improved accuracy initially and 
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smaller increases when the error is smaller. We choose a grid size of 200 x 200 throughout of this 
example. 

Based on the information given above regarding the grid size, number of iterations and error 
size, we can derive the minimum information copula Ctm associated with the chosen constraints. 
This copula based on the orthonormal polynomial bases is plotted in Figure 21 and the copula based 
on the Legendre multiwavelet basis functions is plotted in Figure [5] We present Lagrange multiplies 
values (or parameter values) for this approximated copula density as follows 

Ai = -0.1995, A2 = 0.1651, A3 = 0.0912, A4 = -0.0774, A5 = -0.0772, = 0.0527 

and in the similar way these parameter values for the minimum information copula based on the 
Legendre multiwavelets bases are given by 

Ai = 1.9845, A2 = 1.6158, A3 = 0.0023, A4 = -0.0263, A5 = -7.4167, Ag = 3.6819 

Minimally informative copula given the experts' assessments Minimally informative copula given the experts' assessments 




Figure 4: The minimally informative cop- Figure 5: The minimally informative copula 
ula between T and M using the orthonormal between T and M using the Legendre multi- 
polynomial bases wavelets bases 

One of the main advantages of using the orthonormal polynomial and Legendre multiwavelets 
basis functions over the ordinary polynomial series considered in Bedford et al. (2012) is that the 
D1AD2 algorithm converges faster using these bases. This is because of the nice property of these 
two bases that adding a new basis to the kernel defined in (3) and used to construct the minimum 
information copula, does not change the Lagrange multipliers of the already used in the kernel. This 
is shown in Table [T] for the orthonorma polynomial basis functions. But, this is not the case when 
one is applying the ordinary polynomial bases (as proposed by Bedford et al, 2012) to calculate the 
minimum information copula. In this situation, we need to run the D1AD2 algorithm each time a 
new base is added to the already chosen bases, and the parameter values are changing accordingly. 
Therefore, more iterations are required for the D1AD2 algorithm to converge. The optimisation 
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Base 


Parameter values 


Log-Likelihood 


(^i(T)(^i(M) 


-0.1995 






29.36 


Previous one, Lp2[T){p2{M) 


-0.1995, 0.1651 






49.2 


Previous one, Lpi{T)ip2{M) 


-0.1995, 0.1651, 0.0912 






52.8 


Previous basis, ipi{T){p'i{M) 


-0.1995, 0.1651, 0.0912, 


-0.0774 




56.16 


Previous basis, {p^{T)ip-i{M) 


-0.1995, 0.1651, 0.0912, 


-0.0774, 


-0.0772 


59.04 


Previous basis, {pi{T)ipi{M) 


-0.1995, 0.1651, 0.0912, 


-0.0774, 


-0.0772, 0.0527 


60.66 



Table 1: Adding new orthonormal polynomial basis did not change Lagrange multiplier 



Type of Bases 


Number of bases 


Log-Likelihood 


Ordinary Polynomial (Bedford et al. 2012) 


6 


58.1256 


Orthonormal polynomial (Subsection 5.1) 


6 


60.66 


Legangre multiwavelets (Subsection 5.2) 


6 


63.36 



Table 2: Log- likelihoods of the minimum information copulae of different basis functions 



time required for the D1AD2 algorithm using the the orthonormal polynomial bases is only 35.8646 
seconds, for Legendre multiwavelets is 29.359 while this time for the ordinary polynomial bases is 
72.93 seconds which is almost twofold of the former one and almost two and half times more than 
the latter one. 

Furthermore, by comparing the log-likelihoods of the minimum information copulas based on the 
ordinary polynomial, orthonormal polynomial and Legangre multiwavelets, we can conclude that the 
latter one produce more reliable copula density approximation in the sense that the corresponding 
log-likelihood is much larger. We present the log-likelihood of these approximated copulae using the 
aforementioned bases in Table [2l It should be noticed that the log-likelihood of the approximated 
copula using only 5 bases of orthonormal polynomial or Legangre multiwavelets is still larger than 
the fitted copula based on the six ordinary polynomial bases. In addition, we realize that the derived 
approximated copula in term of the bases proposed in this paper are more flexible than ordinary 
polynomial bases, since they aren't sensitive to the initial values chosen for the parameter values 
(Lagrange multipliers) in the D1AD2 algorithm. 

The second copula in the first tree (Ti) is Cmb- Using the stepwise method, we choose the 
following orthonormal polynomial bases 

h[{M,B)^ipi{M)ipi{B), h'2{M,B)^ip2{M)ip2{B), h^{M,B) = ^i{M)^^{B), 
h'^{M,B) = ip2{M)MB), h'^{M,B) = ipi{M)^iiB), K{M,B) = ip^{M)ip5{B) 
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and we also select the following Legendre multiwavelets basis functions 

h[{M,B) = (t)\M)(t>\B), h^{M,B) = (f>'^{M)(l)'^{B), h:i{M,B) ^ (j}'^ {M)(j)^{B), 
h'^{M,B) = <p\M)cf>\B), h'^iM,B) = V'(M)</)i(i?), K{M,B) = i?{M)cf>\B) 

We similarly construct the minimally informative copulae associated with the orthonormal poly- 
nomial bases which is shown in Figure |6l Note that the minimum information copulas for the 
orthonormal polynomial and Legendre multiwavelets bases are quite similar, but the figure of later 
one to some extent is smoother than the former one. The constraints as the mean of the chosen 
orthonormal polynomial bases for the Norwegian Financial returns data are presented as 

ai = 0.4803, a2 = 0.2298, ag = 0.0841, = 0.0989, = 0.0757, = -0.0112 

The parameter values associated with the fitted minimum information copula to the data with these 
constraints are given by 

Ai= 0.5701, A2 = 0.0847, A3 = 0.0433, A4 = 0.1000, A5 0.0830, As = -0.0531 

The constraints for the Legendre multiwavelets bases are 

ai = 0.4803, a2 = 0.2298, as = 0.0989, 04 = 0.0757, = 0.0531, = 0.0463 

and the corresponding parameter values are as follows 

Ai = 377.3642, A2 = 193.9254, A3 = 253.2358, A4 = 281.7057, A5 = -622.0234, Ag = 12.2802 

The log-likelihoods corresponding to the orthonormal polynomial and Legendre multiwavelets bases 
are 158.0013 and 159.72, respectively, which are again more than the log-likelihood calculated based 
on the ordinary polynomial bases. 

The third marginal copula is between B and S. Similarly, the six bases are selected using the 
stepwise procedure, and the corresponding constraints and resulting Lagrange multipliers are given 
in Table[3]and Table|3]for orthonormal and Legendre multiwavelets, respectively. The approximated 
minimally informative copula in terms of the orthonormal polynomial bases is shown in Figure [71 
Note that the minimum information copula associated with the Legendre multiwavelets bases is 
very similar to the one given Figure [71 but to some extent is slightly smoother. 

The conditional copulas in the second tree, T2 can similarly be approximated using the minimum 
information approach. We only illustrate construction of the conditional minimum informative 
copula between T\M and B\M, and the other conditional copulas in this tree can be similarly 
approximated. In order to calculate this copula, we divide the support of M into some arbitrary 
sub-intervals or bins and then construct the conditional copula within each bin. To do so we select 
bases in the same way as for the marginal copulas and fit the copulae to the calculated mean values 
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Figure 6: The minimally informative cop- Figure 7: The minimally informative cop- 
ula between M and B using the orthonormal ula between M and B using the orthonormal 
polynomial bases polynomial bases 
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Likelihood 










Likelihood 




-0.1557 


-0.1467 






<t^\B)4>\S) 


0.4803 


-70.35 






0.1010 


0.0836 






<I>HB)4>\S) 


0.2298 


91.72 






-0.0510 


-0.0426 


20.13 




^^{BWiS) 


0.0539 


16.61 


25.07 




-0.0378 


-0.0365 






i^\B)ct>\S) 


0.0531 


-22.29 




MB)MS) 


0.0253 


0.0257 






^'{B)<^HS) 


0.0011 


2.39 




MB)MS) 


0.0222 


0.0240 






^^{B)<j?{S) 


-0.0098 


-3.49 





Table 3: The minimally informative copula Table 4: The minimally informative copula 
for orthonormal polynomial bases between B for Legendrc multiwavelets bases between B 
and S and S 



or constraints. Here, we use four bins so that the first copula is for T^B\M E (0,0.25). The bases 
for this copula based on the orthonormal polynomial basis are 

h[iT,B\M e (0,0.25)) = (^2(T)(^i(B), h'^iT,B\M e (0,0.25)) = (^5(T)(^i(B) 
h'-i{T,B\M e (0,0.25)) = (^3(T)(^i(B), h'^{T,B\M G (0,0.25)) = ipi{T)ipi{B) 
h^{T, B\M e (0, 0.25)) = (^i(T)(^3(B), K{T, B\M e (0, 0.25)) - V2{T)v3{B) 

and the Legendre multiwavelets bases are also given by 

h[{T,B\M e (0,0.25)) = 0^(r)0\B), h^{T,B\M E (0, 0.25)) = 0^(r)02(B) 
h^{T,B\ME{Q,Q.2h)) = i:^T)<P^(B), h^{T,B\M E {Q,Q.2^)) ^ ^\T)<i>\B) 
h'^{T, B\M e (0, 0.25)) = 'iJj^{T)(l>^{B), h'f^{T, B\M e (0, 0.25)) = iIj^{T)(I)'^{B) 
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The mean values for orthonormal polynomial basis functions which will constrain the minimum 
information copula are 

ai = -0.2995, ^2 = -0.1240, = -0.1634, = -0.0317, ag = -0.0585, = -0.0630 

and these expectations for Legendre multiwavelets bases are as follows 

ai = 0.4803, a2 = 0.2298, as = 0.0539, 04 = 0.0531, = 0.0011, ag = -0.0098 

We can follow this process again for the remaining bins. Tables [5] and [6] show the mean values 
or constraints (denoted by ai) and corresponding Lagrange multipliers (A^) required to build the 
conditional minimum information copula between T\M and B\M for orthonormal polynomial and 
Legendre multiwavelets bases, respectively. The log-likelihood of the approximated copula in each 
bin is also reported in these tables. 

Note that the resulting minimum information copula over all bins for orthonormal polynomial 
bases is 47.54 and for Legendre multiwavelets is 58.41 while this amount for the ordinary polynomial 
bases is only 29.242 which indicates superiority of the former bases. 

We can obtain the conditional minimum informative copula in the third tree, T3, similarly by 
dividing each of the conditioning variables' supports into four bins. Then the minimum information 
copulas for T\{B, M) and S\{B^ M) are calculated on each combination of bins for M and B which 
makes 16 bins altogether for this tree. The bins, bases and log-likelihoods associated with each 
copula based on the orthonormal polynomial and Legendre multiwavelets basis are given in Tables 
[3 and [51 respectively. 

Thus the log-likelihood of the overall vine, obtained by summing the log-likelihoods of each of 
the component copulas above, is 388.859. 

The log-likelihood of the overall pair-copula model using the orthonormal polynomial (and 
Legendre multiwavelets) bases, derived by adding the log-likelihoods of the copulas constructed 
above, is then 434.135 (and this amount for Legendre multiwavelets is 552.25). These values are 
considerably greater than the log-likelihoods of the fitted pair-copula models to the data using the 
Gaussian copula, t-copula and the approximated pair-copula model using the ordinary polynomial 
bases. 

6.1 Comparison To Other Approaches 

In this subsection, we compare our method with the other methods used to approximate the multi- 
variate distribution fitted to the Norwegian financial returns data. In order to make a comparison 
we compute the log-likelihood of the approximated density function by the method presented in this 
paper and other approaches reported in Aas et. (2009) and Bedford et. (2012). The log-likelihood 
of the overall pair-copula model using the orthonormal polynomial and Legendre multiwavelets 
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ip4{T)ip2{B) 


-0.0621 


-0.0094 






ip2{T)ipi{B) 


0.1184 


0.1679 






'Pi{T)(p3{B) 


-0.1080 


-0.2311 




0.5 < M < 0.75 


(P2{T)(P4{B) 


0.0956 


0.1459 


9.74 




fi{T)'fi5{B) 


-0.0815 


-0.2047 






(pi(T)ip2iB) 


-0.0627 


-0.1869 






(p3(T)ipi{B) 


0.0245 


0.1253 






ipi{T)ipi{B) 


-0.2659 


-0.3177 






ip2{T)ip4{B) 


0.1568 


0.1135 




0.75 < M < 1 


ip4{T)ipi{B) 


0.1025 


0.1290 


10.53 




(f>i{T)ip5{B) 


-0.0079 


0.0526 






•^i{T)'^z{B) 


-0.1737 


-0.1007 






'P3{T)(p3{B) 


-0.0376 


0.0456 





Table 5: Minimaly informative copula for orthonormal basis between T and B given M 
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10.46 






,^2(T)</,i(S) 


0.118 


-269.93 








-0.1080 


-439.13 




0.5 < M < 0.75 




-0.102 


104.95 


11.30 




(t>'^{T)4>^{B) 


0.096 


-29.99 








0.093 


373.59 








0.059 


-14.53 








-0.2659 


247.49 








0.1568 


-110.67 




0.75 < M < 1 


(j)'^{T)4?-{B) 


0.1025 


-108.01 


12.86 




xpHT)(f)^{B) 


0.069 


-222.75 






(j)^{T)<j)^{B) 


0.021 


-175.39 






(j)^{T)(P^{B) 


0.063 


-15.69 





Table 6: Minimum information copula for Legendre multiwavelets between T and B given M 
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Interval 


Bases 


Log-Likelihood 


< M < 0.25, < B < 0.25 


Ly^ 1 J O 7^ 1 7 ^r^l 7*^0 r 1 ; y^^iy^ Z ) y^ i y 


8.93 


< Af < 0.25, 0.25 < S < 0.5 


T^lT^il t'It^O? t'ot^o? or 1 t'OT^I) t'Ot^I 


7.31 


< Af < 0.25, 0.5 < -B < 0.75 


T^ZY^'ii Y^oY^oi Y^Ot^Ii Y^ZY^Zi Y^iV^Zi Y^oY^Z 


6.81 


< Af < 0.25, 0.75 < B <l 


T^4t^±1 T^ZT^Ol T^lT^Zl Y^iT^Ol Y^ZT^Z? Y^ZY^4 


9.65 


0.25 < M < 0.5, < B < 0.25 


091 , (Z59094, (^'i09i , (P'iiP'i, (P-\ 09^, 09i 099 
Y^lY^l; Y^ZY^'li Y^oY^l; Y^oY^o; Y^lY^Oi Y^IY^Z 


8.63 


0.25 < M < 0.5, 0.25 < i? < 0.5 


09i 09i 09r^09i . 099 09i . 09/1 09i . 09'?09i 09i 09/1 
Y'^iY'^i: Y^oY*^!! Y'^zY'^ii y^^y^lj Y^oY*^!! Y''1Y''4 


7.67 


0.25 < M < 0.5, 0.5< B < 0.75 


09i 09i . 099 091 . 09 ^091 . 099 09^ . 09i 09^; . O9'^099 
yiY^li Y^ZY^l? Y^OY^l) Y^zyo; Y^lY^O) Y^oY^z 


9.5 


0.25 < M < 0.5, 0.75 < B < 1 


09^ 099 , 091 (Z^-^, (Z?909zi , (Z''?09'?, , 094.099 
Y^oY^zi Y^iY^oi Y^ZY^4i Y^oY^oi Y^oY^ii y^4:Y^z 


5.62 


0.5 < Af < 0.75,0 <B < 0.25 




4.93 


0.5 < AT < 0.75, 0.25 < B < 0.5 




10.49 


0.5 < Af < 0.75, 0.5 < B < 0.75 


ipiipi,ipiip2, tpiipup^^j,, ^>2^2, V'S'/'l 


8.97 


0.5 < Af < 0.75, 0.75 < B < 1 


V3l(y9i, (p3(y93, (p4(y9i, (p2¥'3, ¥'l<y34, ¥'2¥'4 


10.08 


0.75 < Af < 1, < B < 0.25 


V'4¥'2, V'SV'I, V'l</'5,</'3V'2,</Jl¥'2,</'l'/24 


3.7 


0.75 < Af < 1, 0.25 < B < 0.5 


(^2<y£'2 , </'2'P4, V2</?3, (^iV^l i </'4<Pl , '^'^^2 


8.7 


0.75 < Af < 1, 0.5 < B < 0.75 


V5iV?4, </9l(/?l, (pst/?!, <^l¥'3, </'3'/'2, ¥'3'/'3 


5.61 


0.75 < Af < 1, 0.75 <B <1 


(/32<y52, V32'/fl, </5l(p5,(/?4(pi, (^3(^2 


20.24 



Table 7: Minimum information copula for orthonormal basis between T and S givenAf and B 



bases, obtained by adding the log-likelihoods of each of the component copulas presented above, 
are 434.135 and 552.25, respectively. These values are much greater than that obtained using the 
i-copula examined by Aas et al (2009) of 291.801 and the minimum information copula based on the 
ordinary polynomial bases of Bedford et al (2012) of 388.859. Note that, if we only use five bases 
to approximate the multivariate density of interest, the log-likelihoods associated with orthonormal 
polynomial and Legendre multiwavelets bases will be 429.3982 and 446.235, respectively, which 
are still clearly better than the model proposed by Bedford et al (2012) based on the six ordinary 
polynomial bases. We have computed the log-likelihood of the data sample for five different copula 
models used on the same vine structure: The Gaussian copula, the t-copula used by Aas et al. 
(2009), the minimum information copula using the ordinary polynomial bases presented by Bedford 
et al. (2012) and our approximated copulas. We illustrate the corresponding results in Table [9l 

7 Conclusion 

In this paper, we extend the novel method originally presented by Bedford et al (2012) to approxi- 
mate a multivariate distribution by any vine structure to any degree of approximation. The main 
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Interval 


Bases 


Log-Likelihood 


< M < 0.25, < B < 0.25 


V'"'/'^! ^'''/'^i 0^0^, '^'^4''^ 


10.64 


< M < 0.25, 0.25 < B < 0.5 




9.42 


< M < 0.25, 0.5 < B < 0.75 


4>^(l)^, (j^4>^, ^5^1^ 02^4 


13.67 


< M < 0.25, 0.75 < B < 1 




10.42 


0.25 < M < 0.5. < B < 0.25 




12.99 


0.25 < M < 0.5, 0.25 < B < 0.5 




15.67 


0.25 < M < 0.5, 0.5 < B < 0.75 


(f)^(j)^,(j)^<j)'^,(j)'^(f)^,(j)'^(p'^,^p^(f)\^p^(f)^ 


10.56 


0.25 < M < 0.5, 0.75 < B <1 


(t/'(t)'^,ilP(jy^, <j)^(t>^,<j)^(t)^, ijp(t/',(fi(b^ 


10.77 


0.5 < M < 0.75,0 <B < 0.25 


^5^5^ 0l04^ ^4^2^ ^4^1^ ^3^3 


9.89 


0.5 < M < 0.75, 0.25 < B < 0.5 


^1^1^ ^3^1^ ^2^1^ ^2^2^ ^4^4 


10.26 


0.5 < M < 0.75, 0.5 < B < 0.75 


<^v^ V'^<^^ V'^<^^ V'V*, <i>^<i>^ 


14.01 


0.5 < M < 0.75, 0.75 < B < 1 


^!.3</,4, ^0^4^ ^1^1, 03,^5^ ^3,^3^ ^3^2 


17.97 


0.75 <M<1, 0<B< 0.25 


(^1(^1, 02^3^ ^4^1^ ^303^ ^303^ ^2^1 


11.17 


0.75 < M < 1, 0.25 < B < 0.5 




14.31 


0.75 < M < 1, 0.5 < B < 0.75 




10.61 


0.75 < M < 1, 0.75 <B <1 


03^1 , 02^2 ^ 01 05 ^ ^4 01 ^ 02 04 ^ ^5 05 


24.39 



Table 8: Minimally informative copula for Legendre multiwavelets between T and S givenM and 
B 

idea to implement this approximation method is to use the minimum information copulae that can 

be determined to aiw required degree of precision based on the data available. To approximate a 
multivariate distribution by this method, we need to specify: 1) a vine structure; 2) a basis family; 
3) for each part of vine, expected values for the certain functions associated with some constraints 
on each pairwise copula. 

Bedford et al (2012) approximate all conditional copulas using linear combinations of the or- 
dinary polynomial basis functions. We make this approximation more precise by choosing more 
appropriate basis family. We concentrate on the orthonormal polynomial basis functions and Leg- 
endre multiwavelets in this paper. The Legendre multiwavelets and orthonormal polynomial basis 
functions are shown that to be more convenient than some other natural basis for the purpose of 
calculation. A very nice property of the orthonormal polynomial basis is that adding a new item to 
the expansion does not change coefficient of the already found shorter expansion which is not the 
case for the non-orthonormal basis where any new item has in general nonzero projection on previ- 
ous items. It means that the already found coefficients of the expansion would have to be changed. 
The Legendre multiwavelets basis, not only has this property, but the computation of the minimum 
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TvDf of connl^i 


TjOff-Tjikelihood 


Gaussian copula (Aas et al. 2009) 


263.5052 


t copula (Aas et al. 2009) 


291.8014 


Minimum information copula based 

on polynomial basis (Bedford et al., 2012) 


388.859 


Minimum information copula 
based on orthonormal polynomial 


434.135 


Minimum information copula 
based on Legendre multiwavclcts 


552.25 



Table 9: Comparison between different models. 



information copula using this basis becomes even faster and the approximation would considerably 
improve. In other words, applying these basis is so important from three main aspects: firstly, less 
computation time is required to approximate the minimum information copula of interest; secondly, 
the fitted models to the data using the minimum information copulas based on the orthonormal 
polynomial and Legendre multiwavelets bases are better in the sense that their log-likelihoods are 
much larger than than log-likelihood of the alternative models; thirdly, the approximations made 
in this paper are robust in the sense that they are not sensitive to the initial values chosen for the 
parameter values. 

In addition to these properties, our method has this property that it can be used to build 
arbitrarily good approximations to the original distribution. One of the most clear sources of 
potential error in our approximation is the choice of base where it is convenient to take a low number 
of functions hi. The terms chosen in both orthonormal polynomial and Legendre multiwavelets 
would generate asymmetric copulas which seems to have great impact in modelling general data 
sets. The use of large numbers of functions does give more accuracy, at the cost of considerable 
extra computation at the construction stage but at no extra cost at the sampling stage. Indeed, 
we can approximate the requested model more precisely using less numbers of basis functions 
proposed in this paper and with smaller computation time than the alternative methods. In fact, 
the generalization made in this paper gives natural ways to generate asymmetric copulas, and 
simple ways to specify non-constant conditional correlations (or other moments). At moment, we 
are investigating some alternative methods to the stepwise method used in this paper to find the 
most optimal basis functions in a sense that with smaller number of these bases, we would get the 
largest log-likelihood. 

The method used in this paper is very flexible and any functions can be used to construct 
the minimum information copulas used here. This method can be use for modeling more complex 
applications at which basis functions should be computed in computer codes. Due to numerous 
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evaluation of these function to construct the minimum information distribution, the computation 
and then approximation will be infeasible. One suggestion to ease the computation and reduce the 
complexity of model is to use the Gaussian process emulators. 

Acknowledgement: The authors are grateful to Professor Tim Bedford for his helpful com- 
ments for some parts of the paper. 
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